Arkil Patel

I am a PhD student in Computer Science at Mila and McGill where I am supervised by Prof. Dzmitry Bahdanau and Prof. Siva Reddy. Previously, I spent 2.5 amazing years as a Research Fellow at Microsoft Research India, where I worked with Dr. Navin Goyal. I also interned with the AllenNLP team at the Allen Institute for Artificial Intelligence (AI2). At AI2, I worked with Pradeep Dasigi on evaluating code generation in LLMs.

I do research in Machine Learning on various interesting aspects surrounding Large Language Models (LLMs). My work focuses on building a principled and predictive understanding of how LLMs behave across varying data and training regimes. My goal is to uncover general laws that help explain when and why capabilities emerge, how they transfer or fail out of distribution, and which signals reliably anticipate downstream performance. I aim to leverage these insights for forecasting model behaviour and for informing the design of more reliable LLMs.

Keywords: generalization, scaling, reasoning, evaluation, safety, analysis and interpretability

I graduated with B.E. (Hons.) in Computer Science from BITS Pilani - Goa Campus, India in 2020. For more details about my background, refer to my CV. If you'd like to chat with me about my work or research in general, feel free to reach out!

News

Jan 12, 2026

Our Thoughtology paper investigating the reasoning chains-of-thoughts of Large Reasoning Models like DeepSeek-R1 has been published at TMLR!

May 01, 2025

Our paper proposing SafeArena, a benchmark for evaluating the safety of autonomous web agents is accepted at ICML 2025!

Mar 30, 2025

Our paper on AI safety investigating the transferability of adversarial triggers in LLMs has been accepted to TACL!

Mar 22, 2025

I'm a visiting graduate student at the Simons Institute at UC Berkeley as a part of their special year on LLMs and Transformers.

Feb 20, 2025

Our paper proposing the CHASE method to automatically generate challenging synthetic data for evaluating LLMs is out!

Jun 16, 2024

Presented my AI2 internship work on evaluating code generation in LLMs at NAACL 2024 in Mexico City!

Publications

Google Scholar| Semantic Scholar

How to Get Your LLM to Generate Challenging Problems for Evaluation
Arkil Patel, Siva Reddy, Dzmitry Bahdanau
Preprint
pdf code abstract

DeepSeek-R1 Thoughtology: Let’s think about LLM reasoning
Sara Vera Marjanović*, Arkil Patel*, Vaibhav Adlakha, Milad Aghajohari, Parishad BehnamGhader, Mehar Bhatia, Aditi Khandelwal, Austin Kraft, Benno Krojer, Xing Han Lù, Nicholas Meade, Dongchan Shin, Amirhossein Kazemnejad, Gaurav Kamath, Marius Mosbach, Karolina Stańczak, Siva Reddy
TMLR'26
pdf code abstract

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade, Arkil Patel, Dongchan Shin, Alejandra Zambrano, Karolina Stanczak, Peter Shaw, Christopher Pal, Siva Reddy
CoLM'25
pdf code abstract

Safearena: Evaluating the safety of autonomous web agents
Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Stańczak, Siva Reddy
ICML'25
pdf code abstract

Universal Adversarial Triggers Are Not Universal
Nicholas Meade, Arkil Patel, Siva Reddy
TACL'25
pdf code abstract

Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep Dasigi
NAACL'24
pdf code abstract

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade
ICLR'24 [Oral]
pdf code abstract

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau
EMNLP'23 [Oral]
pdf code abstract

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom
ACL'23
pdf code abstract

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
Ankur Sikarwar, Arkil Patel, Navin Goyal
EMNLP'22 [Oral]
pdf code abstract

Revisiting the Compositional Generalization Abilities of Neural Sequence Models
Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal
ACL'22
pdf code abstract

Are NLP Models really able to Solve Simple Math Word Problems?
Arkil Patel, Satwik Bhattamishra, Navin Goyal
NAACL'21
pdf code abstract article

On the Computational Power of Transformers and its Implications in Sequence Modeling
Satwik Bhattamishra, Arkil Patel, Navin Goyal
CoNLL'20
pdf code abstract

VehicleChain: Blockchain-based Vehicular Data Transmission Scheme for Smart City
Arkil Patel, Naigam Shah, Trupil Limbasiya, Debasis Das
IEEE SMC'19
pdf

Service

Teaching

Winter 2026: Teaching Assistant for COMP 767: Large Language Models - McGill University
Fall 2024: Teaching Assistant for COMP 767: Large Language Models - McGill University
Winter 2024: Teaching Assistant for COMP 596: From Natural Language to Data Science - McGill University
Winter 2023: Teaching Assistant for COMP 596: From Natural Language to Data Science - McGill University
Winter 2020: Teaching Assistant for BITS F312: Neural Networks and Fuzzy Logic - BITS Goa
Winter 2019: Teaching Assistant for CS F415: Data Mining - BITS Goa

Reviewer ICML ICLR NeurIPS CoLM ACL Rolling Review

BITS Pilani

2016 - 2020

Microsoft Research India

2019 - 2022

Allen Institute for AI

Summer 2023

Mila - Quebec AI Institute

2022 - Present

McGill University

2022 - Present

Template: Sebastin